A. Why Multi-Agent?

Scaling from one brain to a team

Agenda

  • A. Why Multi-Agent? — The single-agent bottleneck (~10 min)
  • B. Specialist Roles — Researcher, Analyst, Writer (~15 min)
  • C. Orchestration Patterns — Pipeline, Hub-and-Spoke, Workspace (~20 min)
  • D. Parallelism — Async execution for independent tasks (~10 min)
  • E. Quality Gates — Review loops and conflict resolution (~15 min)
  • F. Wrap-up — Key takeaways & lab preview (~5 min)

The Single-Agent Bottleneck

Your ReactAgent from Session 1 is powerful, but has limits:

  1. Context window saturation — research + analysis + writing crams everything into one conversation
  2. Role confusion — “research thoroughly AND write concisely” produces mediocre results at both
  3. Sequential execution — one agent can only do one thing at a time

When Multi-Agent Actually Helps

Scenario Single Agent Multi-Agent
Simple research question Perfect Overkill
Deep research + polished report Context overload by step 8 Specialists stay focused
Compare 3+ independent topics Sequential, slow Parallel research agents
Tasks needing quality gates Agent grades its own work Analyst reviews Researcher
User needs progress by stage One blob at the end Stream results per specialist

The #1 Mistake

Premature decomposition — creating 5 agents for a task that one agent handles fine. Always start with a single agent. Only split when you hit a bottleneck.

B. Specialist Roles

Same brain, different personalities

The Specialization Pattern

Each specialist is a ReactAgent with:

  • A focused system prompt that constrains its role
  • A curated tool set (researchers get search, writers get formatting)
  • Lower max_steps (specialists finish faster)
# The code is identical — only the prompt and tools differ
researcher = ReactAgent(model="gpt-4o", max_steps=8, system_prompt=RESEARCHER_PROMPT)
researcher.tools = [search, web_reader, document_retrieval]

The Specialization Recipe

Each specialist prompt must define:

Component Purpose
Role boundary “ONLY do X — never do Y”
Quality standard What good output looks like
Failure behavior What to do when the primary approach doesn’t work
Output format Raw findings / structured analysis / polished prose
Specialist Tools Output
Researcher search, web_reader, document_retrieval Raw findings with citations
Analyst Reasoning-only (no search) Structured analysis with confidence ratings
Writer Formatting-only (pure generation) Polished document for target audience

Tip

You will write the actual prompts in Lab 2.

C. Orchestration Patterns

Three ways to coordinate agents

Pattern 1: Pipeline (Relay Race)

Agents execute sequentially, each passing output to the next.

graph LR
    R["Researcher"] -->|findings| A["Analyst"]
    A -->|analysis| W["Writer"]
    W -->|report| O["Output"]

    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style O fill:#1C355E,stroke:#00C9A7,color:white

  • Pros: Simple, predictable, easy to debug
  • Cons: Slow (no parallelism), each agent waits for the previous one
  • Use when: Tasks have clear sequential dependencies

Pattern 2: Hub-and-Spoke (Supervisor)

A central Orchestrator assigns tasks and collects results.

graph TB
    O["Orchestrator<br/>(Supervisor)"]
    O -->|assign| R1["Researcher 1"]
    O -->|assign| R2["Researcher 2"]
    O -->|assign| A["Analyst"]
    R1 -->|results| O
    R2 -->|results| O
    A -->|analysis| O

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R1 fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style R2 fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E

  • Pros: Parallel execution, orchestrator controls flow
  • Cons: Orchestrator is a single point of failure
  • Use when: Tasks have independent sub-problems

Pattern 3: Shared Workspace

Agents read from and write to a shared state.

graph TB
    R["Researcher"] -->|write| WS["Shared Workspace<br/>(entries by type)"]
    A["Analyst"] -->|read/write| WS
    W["Writer"] -->|read/write| WS
    WS -->|read| A
    WS -->|read| W

    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style WS fill:#1C355E,stroke:#00C9A7,color:white

  • Pros: Flexible, agents can iterate on shared context
  • Cons: Coordination complexity — who reads what and when?
  • Use when: Tasks require iterative refinement

Our Architecture: Hybrid

We combine Hub-and-Spoke + Shared Workspace:

graph TB
    O["Orchestrator"]
    O -->|"Phase 1"| R["Researcher(s)"]
    R -->|write| WS["Shared Workspace"]
    O -->|"Phase 2"| A["Analyst"]
    WS -->|read| A
    A -->|write| WS
    O -->|"Phase 3"| W["Writer"]
    WS -->|read| W
    W -->|write| WS
    O -->|"Phase 4"| QG["Quality Gate"]

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style WS fill:#F0F4F8,stroke:#1C355E,color:#1C355E
    style QG fill:#FF7A5C,stroke:#1C355E,color:#1C355E

D. Parallelism

Running independent tasks concurrently

The Sequential Problem

# Sequential: 6+ seconds for 3 independent topics
result_1 = researcher.run("Research topic A")   # 2s
result_2 = researcher.run("Research topic B")   # 2s
result_3 = researcher.run("Research topic C")   # 2s
# Parallel: ~2 seconds for all 3 topics
results = await asyncio.gather(
    run_agent_async(researcher, "Research topic A"),
    run_agent_async(researcher, "Research topic B"),
    run_agent_async(researcher, "Research topic C"),
)

3x speedup with zero quality loss.

asyncio.gather in Practice

# 1. Native Async (Preferred)
# Agents are coroutines and don't block the loop
results = await asyncio.gather(
    *[self._run_agent_with_retry(task) for task in research_tasks]
)

# 2. Bridge Approach (For synchronous legacy code)
# Wrap blocking calls in a thread pool to avoid freezing the loop
loop = asyncio.get_event_loop()
results = await asyncio.gather(
    *[loop.run_in_executor(None, self._run_agent, t) for t in tasks]
)

Async vs Sync

Use native await for async agents. If you have synchronous operations (like litellm.completion or requests), you must use a thread pool via run_in_executor to keep the event loop alive.

When to Parallelize

Scenario Parallelize? Why
Research Topic A and Topic B Yes Independent, no shared state
Research then Analyze No Analysis depends on research results
Analyze 3 sources independently Yes Each analysis is independent
Write then Review No Review depends on the draft

Rule: Parallelize tasks that share no data dependencies.

E. Quality Gates

Review loops and conflict resolution

The Review Loop

After the Writer produces a draft, the Analyst reviews it:

graph LR
    W["Writer<br/>produces draft"] --> A["Analyst<br/>reviews draft"]
    A -->|"APPROVED"| F["Final Output"]
    A -->|"revision notes"| W

    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style F fill:#00C9A7,stroke:#1C355E,color:#1C355E

The Analyst checks for:

  1. Factual accuracy against the research
  2. Missing important points
  3. Unsupported claims (low-confidence presented as fact)
  4. Structural issues

Implementing the Quality Gate

async def _quality_gate(draft, max_revisions):
    for revision in range(max_revisions):
        review = analyst.evaluate(draft)        # invented method name

        if review.approved:
            return draft                        # Done

        draft = writer.revise(draft, feedback=review)

    return draft                                # Ship after max revisions

Always Set max_revisions

Without a limit, the Analyst and Writer can enter an infinite review loop. In production, 2-3 revisions is usually enough.

The Complete Workflow

sequenceDiagram
    participant O as Orchestrator
    participant R as Researcher(s)
    participant A as Analyst
    participant W as Writer

    O->>R: Phase 1: Research (parallel)
    R-->>O: Raw findings

    O->>A: Phase 2: Analyze findings
    A-->>O: Structured analysis

    O->>W: Phase 3: Write report
    W-->>O: Draft

    O->>A: Phase 4: Review draft
    A-->>O: "Revision needed"
    O->>W: Revise with feedback
    W-->>O: Revised draft
    O->>A: Review again
    A-->>O: "APPROVED"

F. Wrap-up

Key Takeaways

  1. Multi-agent is not always better — start with single agent, split at bottlenecks
  2. Specialization = prompt + tools — same ReactAgent code, different personality
  3. Three patterns: Pipeline, Hub-and-Spoke, Shared Workspace
  4. Parallelize independent tasksasyncio.gather for 3x+ speedups
  5. Quality gates prevent bad output — but always set a max_revisions limit

Lab Preview: The Newsroom

Step 1: The Specialists

  • Build Researcher, Analyst, Writer
  • Configure system prompts and tools

Step 2: The Orchestrator

  • Implement MultiAgentOrchestrator
  • Parallel research with asyncio.gather

Step 3: The Quality Gate

  • Add the review loop
  • Set max_revisions limit
  • Test with a comparison query

Time: 75 minutes

Questions?

Session 2 Complete